class: center, middle, inverse, title-slide # Lecture 4 ## Statistical Models and Notation ### Psych 10 C ### University of California, Irvine ### 04/06/2022 --- ## Objective in research - One of our main objectives in research is to contrast our believes about the world with the outcomes of experiments. -- - We do so by starting with some "verbal" statement or belief about the world which we then formalize using a statistical model. -- - Statistical models will allow us to make predictions about future observations. In the case of an experiment, they will allow us to make predictions about the outcomes. -- - The next step is to evaluate those predictions by comparing them with the outcomes (data) of the experiment. -- - Finally, we would like to go back and interpret the results of our evaluations with respect to our original believes or statements about the world. --- ## Statistical models - Statistical models are abstract representations of the world. -- - They are a way in which we can formalize our believes about probabilistic events. -- - For example, if we have an experiment where we throw a coin and have two competing ideas about the coin: -- - The coin is **fair**. -- - The coin is **not fair**. -- - We can formalize these two believes using a statistical model. -- - The coin is fair: `\(P(\{heads\})\ =\ P(\{tails\})\ =\ 0.5\)` -- - The coin is not fair: `\(P(\{heads\})\ \neq\ P(\{tails\})\)` -- - We moved from two verbal statements about our believes regarding the coin to two formal statements about the probability of "heads". --- ## Statistical Models - Statistical models are the formal representation of our believes or hypothesis about the outcomes of an experiment. -- - Given that we assume that the outcomes are probabilistic, our models will have a probabilistic component associated with them. -- - Given the nature of our observations it will be almost impossible for us to tell if a model is TRUE or FALSE. However, we can compare how useful they are on a given situation. -- - Statistical models will allow us to make predictions about our observations, which we will then use to compare how useful they are. -- - However, before we continue it will be useful to introduce some notation! -- - This will provide us with a way to express our models in a formal and standard way. --- class: inverse, middle, center # Notation --- ## Example: - To introduce notation we will start with a problem. -- - **Problem:** We want to know if people that smoke have lower lung capacity in comparison with people that do not smoke. -- - We have a variable that we are interested in, which is lung capacity as measured by some standard test. -- - We also have a variable that indicates if a given participant smokes or not. -- - We call the first one a **dependent** variable, because we want to see how it "depends" on the values of another. -- - We call the smoker indicator variable an **independent** variable. We are interested in how our independent variable affects the values of our dependent variable. -- - In other words, we want to know if lung capacity is a function of smoking status. --- ## Example: Smoking - We collect data from 8 participants, 4 smokers and 4 non smokers. --
-- - We will denote values of our dependent variables using `\(y\)` for example, the first observation of our first group (non-smokers) is denoted as `\(y_{11}\)` while the fourth observation of the same group is denoted `\(y_{41}\)` -- - In general we say that the *i-th* observation of the *j-th* group is denoted as `\(y_{ij}\)`. Note that the letters `\(i\)` and `\(j\)` are a way to denote a general observation, if we want to look at a particular one we can write `\(y_{21}=\)` 77.6. --- ## Example: Memory Experiment - Whenever we run an experiment, we have a variable that we are interested in. In homework one we had the results of an IQ test, in our memory example we had the number of words that are correctly recognized. -- - We will call the variable that we are interested in the **dependent** variable. -- - To reference a particular observation in an experiment we will use the lowercase letter `\(y_{i}\)`. -- - Where the subscripts `\(i\)` indicates our *i-th* observation. -- - Let's look back at the data for test one in our memory experiment --- ## Example - Now we will use the data only from one of the test of our experiment, however, we have added a new variable that indicates the **age_group**.
--- ## Example - In our experiment we have `\(100\)` observations, in other words we have a sequence of `\(y\)` values that goes from `\(y_1, y_2, \dots, y_{100}\)`. -- - Where `\(y_1\)` denotes our first observation, in other words, `\(y_1 =\)` 46. -- - `\(y_{50}\)` denotes our *50-th* observation or `\(y_{50} =\)` 43. -- - When we have more than one group we can add a second subscript to our observations. -- - For example, let's divide our memory data example in two groups, participants who are 35 or younger will be classified as "young" while participants that are older than 35 will be classified as "elder". ```r memory <- memory %>% mutate("age_group" = ifelse(test = age > 35, yes = "elder", no = "young")) ``` --- ## Example - Now we have two groups and we can denote their observations using two subscripts -- - For example, the *i-th* observation of the "young" group can be denoted with `\(y_{i1}\)` while the *i-th* observation of the "elder" group will be `\(y_{i2}\)`. -- - In general, we can say that the *i-th* observation of the *j-th* group will be `\(y_{ij}\)` -- - In our experiment, the first observation (number of correctly recognized words) of the first participant in the "young" group was `\(y_{11}=\)` 46. -- - While the first observation of the "elder" group was `\(y_{12}=\)` 43 --- ## Statistical models